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Chapter I. INTRODUCTION 



A. Description of the Problem 



The problem addressed in this thesis is the 

non-parametr ic estimation of population auantiles. Given a 

random variable X with continuous distribution function 

F (•) , we define the a-quantile s as the solution to the 

a 

equation 



(D 



F (s ) = a 
a 



for some given value of a between 0 and 1. We shall assume 

in what follows that s is unique, i.e. that we are dealing 

a 

with continuous or partly continous distributions. 

Completely discrete distributions with relatively small 
numbers of atoms present a much simpler estimation problem. 
Quantiles find application, for example, in testing 

statistical hypotheses and in characterizing the extreme 
values of the distribution of X when a is near 0 or 1. 



At the outset we note that there is a related problem, 

namely, given a value s, to estimate the quantity p given 

s 

by 



(2) F(s) = p . 

s 

The value p found in this way will be called a perce nt ile . 
s 

Percentiles may be used, for example, to find the power of a 
statistical test under a non-null hypothesis. By way of 
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* 

contrast we note that a is the known value in (1) while s is 
the known in (2) . 

The non-par ametric estimation of percentiles is 

relatively straightforward; the number of values of the 

random variable less than s in a random sample X , X , ... , 

12 

X is clearly a binomial random variable with parameters n 
n 

and p so that this number divided by n is an unbiased 
s 

estimator of p . 

s 

If the distribution function F(») in (1) is completely 

known, finding s becomes a Droblem of numerical 

a 

approximation, i.e. one must evaluate 

(3) s = F-i ( a ) . 

a 

Note that if the random variable X has an infinite support 
the slope of F(*) will be very small in one or both tails of 
the distribution (i.e. as the quantile level a approaches 0 
or 1) ; this means that in evaluating (3) for extreme 
quantiles one is likely to encounter serious numerical 
instabilities. If the distribution function F (• ; 0 ) is 
known except for a finite vector 9 of unknown parameters we 

may still proceed as in (3) provided we have some estimate 9 
of the parameters. The resulting parametric estimate of s 

a 

is given by 



(4) s = F-M a; 0 ) . 

a 



The properties of s will depend on both the underlying 

a 
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distribution F{») and the nature of the estimate 9; tne 
sampling variation of 0, however, is likely to increase the 
numerical difficulties with extreme quantiles. 



If nothing is known about F (•) , one must resort to 
non-parametric or distribution-free methods for estimating 

s . Non-parametric quantile estimation is considerably more 
a 

complex than non-parametric percentile estimation. Two 
solutions have been proposed for this problem (Goodman, 

Lewis and Robbins [14]) : the order statistic estimator, s , 

a 

and a class of stochastic approximation estimators, s . 

a 



The order statistic estimator is obtained by sorting 

the random sample X, X, ... , X into order, thus 

12 n 

determining the order statistics X , X , ..., X 

(1) (2) (n) 

Then the estimator is 



(5) 



A 

S 



where [z] denotes 



[5]) that s has 



X ([a(n+1) ]) ' 
the integer part of z. 
an asymptotically norma 



a 



1 



It is known (David 
distribution with 



(6) E[ s ] = s + 0 (1/n) 

a a 



and 
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( 7 ) 



Var[s ] = 




+ 0(n-2) , 



a 



a 



where f (x) = F' (x) is the density function of the random 
variable X. Unfort unately, the time required to order a 
complete sample of size n is proportional to n In n; thus 
the computational effort for this estimator increases faster 
than the sample size. Furthermore, considerations of finite 
computer memory size limit order statistic estimators to 
samples of perhaps 10,000 observations (less if several 
distributions must be investigated at once as might be the 
case in a systems simulation study). We discuss some other 
considerations relating to order statistic estimators in 
Chapter III; because partial sorting can be done in time 
proportional to n some improvement is possible, but these 
estimators still suffer from serious shortcomings. 

To overcome these drawbacks, we consider a sequential 
estimation scheme. This may be defined by a sequence of 
functions {h } ; our estimates are given recursively by 



where s (j) is the estimator at step j of the procedure. In 



the sequel, we denote this j-th sequential estimator by s , 



suppressing the dependence on a when this will cause no 
confusion. 



n 



( 8 ) 



s (j+1) = h . (s (j) , X . ) , j=1 

a g a g+ 1 



/ • • • / 



a 
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B. Stochastic Approximation Estimators 



The most important class of functions to be used in 
sequential quantile estimation schemes are stochastic 
approximation estimators. There is an extensive literature 
on so-called stochastic approximation methods; these methods 
are intended to find the root x = © of the regression 
function 

(9) E [Y(x) ] = H (x) = a, 

where the only information available consists of independent 
observations on the random variable Y (x) . We note that this 
is a more general problem than the quantile estimation 
problem considered here. Most work on stochastic 
approximation has been concerned with specifying conditions 
under which the sequence of estimators converges 
probabilistically to the correct value. Many of these 
conditions are trivially satisfied in the quantile 
estimation case; for example, the regression function will 
always be bounded since it is a distribution function, F(x) . 

The simplest type of stochastic approximation quantile 
estimators are based on the work of Robbins and Monro [30]. 
They are defined by the relationship 



(10) s =s - a Y ( s ) , n = 1 , 2 , . . . 

n+1 n n n n 

In this formulation (a } is a sequence of positive constants 

n 

of the form 

(11) a = 1 , A > 0, 

n nA 
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and Y (5 ) is a random variable which depends only on X and 
n n n 



s and which is defined by 
n 



( 12 ) 



The initial 
arbitrarily 



Y (s ) = - a 

n n 

1 - a 

estimate s and the 

1 

or at random. 



if X > s 
n n 



if X < s 
n n 



parameter A 



may be chosen 



The procedure given by (10) is called a Robb ins-Monro 
(RM) process; under suitable conditions (which are satisfied 

by (10)-(12) as long as Var[s^] < oo ) , Blum [2] and Dvoretzky 

[7] have shown 

(13) s — > s almost surely (a.s.), 

n a 



(14) 


limE[(s - s ) 2 ] = 0 . 
n->=o n a 






Furthermore 


, Sacks [33] has shown that 


if F ( x ) 


has a 


continuous 


derivative 


f (x) at s then 
a 






(15) 


L 

s — > 
n 


N ( s , a_(1 - a]_ 

a nr[2tJs~T- 
a 


17 ’ ' 




as long as 


0 < A < 


2f(s ) . The asymptotic variance is 
a 


minimized 


by taking 


A = f(s ); this results in 


the same 



a 
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asymptotic normal distribution for s as for the order 

n 



statistic estimator, s . 

a 



C. Improving the RM Estimators 



An intuitive discussion of the operation of the RE 
process (10) will serve to point out ways in which the 
resulting quantile estimators can be improved. First, we 



note that the sequence {s } is a Markov process, although a 

n 



non-homogeneous one. Moreover, as long as A is fixed, s 

n 

n 

may take on one of only 2 distinct values at stage n. This 

/ ' 

is because Y is a discrete random variable: it increases 

n 

the estimate value ("step up") when the latest observation 

is larger than the current estimate and decreases the value 
("step down") when the observation is smaller. 



The actual magnitude of the step is governed by the 

gain sequence {a }. The factor 1/n in (11) is necessary so 

n 

that successive steps become smaller, thus allowing the 



CO 

estimator to converge; however, since 2 (1/n) - 

n = 1 



sequence of estimators can reach any quantile 



starting from an arbitrary initial value s . Note 



that if s is still far from s for even moderately 
n a 



oo the 
value s 

a 

however 
large n. 
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a prohibitive number of steps may be needed to obtain a 
reasonable estimate. 



The first improvement 
suggested by Kesten [18]. 
required to converge to the 

s - s becomes large, the 
n a 



to the basic EM process was 
To cut down the number of steps 
true value after the difference 

divisor n in (11) is modified so 



that it is increased only when the current step direction 

differs from the step taken at the previous stage. This 
suggests that we have "straddled" the true quantile value. 
Although the stochastic approximation estimator obtained in 
this way has the same asymptotic distribution as the EM 
estimator (Davis [6]), its convergence properties in small 
samples seem to be superior (Cochran and Davis [4]; Davis 
[6]). The Kesten procedure does have the disadvantage, 

however, that it often fails to reduce the step size even 

* 

when s is close to s . The optimum procedure is probably 
n a 



to keep the step size constant until s is "close" to s and 

n a 

then to carry out the usual RM procedure. Such a "delayed" 
process has been studied by Cochran and Davis [4] and Davis 

[ 6 ]. 



A related difficulty with the basic RM process is that 
it does not work well at all for the estimation of even 
moderately extreme quantiles (a < 0.25 or a > 0.75). This 
problem was first noted by Wetherill [36]; he traced the 
difficulty to the slow rate of increase of the harmonic 

series 2 (1/ n ) when k >> 1 . 
n=k 
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A solution to this problem was developed by Goodman, 

Lewis and Robbins [14]. Instead of carrying out the 

operation (10) for every sample value X we use only the 

n 

maximum (or minimum for a < 0.5) of some number of 
observations, say v, where v is chosen so that 

v 

(16) a = a’ = 0. 5 

The RM process can then be applied to estimate the 
a’-guantile of the maxima (or minima) ; this has the same 
value as the a-guantile of X. The basic idea is to use a 
data transformation to shift the problem to the estimation 
of a population median, for which RM is known to be 
well-behaved. It is unnecessary to go all the way to the 

median; good results are obtained for 0.3 < a' < 0.7. 
Convergence rates are apparently much improved by this 
procedure; the cost, as Goodman, Lewis and Robbins [14] 
show, is an inflation of the asymptotic variance 



In 



(17) Var[ s' ] = Var[ s 

n n 

most cases the inflation is 



] a M -_aM 
va ' ~[T - - - a) 

less than 40 



Ot 

* 



A natural extension of this so-called maximum 
transformation process is to consider a next- to- maximum 
transformation, i.e. applying the RM process (10) to the 
second largest (or smallest) in a sample of size w where 

V ** 1 w 

(18) wa - (w-1)a = a" = 0.5 

The appeal of this procedure in dealing with highly skewed 
real world data is that it may give a more robust estimation 
procedure. Once again, there is an inflation of the 
asymptotic variance 
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( 19 ) 



Var[s"] = Va r[ s ] a"(1 -_a”L 

n n 2 Zw-3 3 

w(w-1) a (1 -a) 



The inflation is somewhat greater in this case than for the 
maximum transform but it may still be limited to less than 
50 % by the proper choice of w. 



In the remainder of this thesis, a single prime (as in 

a’ or s’) will denote an estimate or parameter which is 
n 

based on the maximum transform while the double prime (e.g.. 



s") will denote a next- to-maximum transformed value. Except 
n 

for equations (17) and (19) , a subscript n appended to a 
primed value will indicate the number of steps taken by the 
corresponding stochastic approximation process and not the X 
sample size, which will be larger. In fact, we will need at 

least n«v X observations to obtain s'; more will be needed 

n 



if the initial estimate s is chosen at random. 

1 



For efficient estimation of a set of several quantiles 
we prefer to use v (or w) values for higher quantiles which 
are integral multiples of the values for lower quantiles; 
this greatly simplifies determination of sample maxima and 
minima. In this research, a set of 19 quantiles has been 
arbitrarily selected; these include the 16 quantiles of 
Goodman, Lewis and Robbins [14] together with the median 
(a = 0.5) and the quartiles (a = 0.25, 0.75). The values of 
v and w for each of the transformation schemes together with 
the respective variance inflation factors are shown in Table 
I. 
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Having 


dealt 


with 


the effec 


ts of the 1/n term in the 


gain sequence 


{a } 
n 


we now 


consider 


the parameter 


A. The 


0 (n -1 ) variance 


implied 


by (15) 


will re 


suit whe 


n A is not 


too large, i. 


e. when the initial step size 


is not 


too small. 


It is known 


(Major and 


Revesz 


i — i 

fsJ 

i » 


that th 


e order of 


a 


V 


a ' 


V* 


V 


a" 


V" 


.001 


672 


.4895 


1.425 


1536 


.4542 


1 . 476 


.002 


336 


.4897 


1.425 


768 


.4543 


1. 476 


.005 


112 


. 4296 


1.338 


384 


.5726 


1 . 608 


.010 


56 


.4304 


1.336 


192 


.57 32 


1 . 608 


.020 


28 


.4320 


1.331 


96 


. 5745 


1 . 606 


.025 


28 


.5078 


1.437 


48 


. 3383 


1 . 423 


.050 


14 


. 5123 


1.426 


24 


.3392 


1 . 420 


.100 


7 


.5217 


1.402 


12 


. 34 10 


1.414 


.250 


1 


. 2500 


1.000 


6 


.4661 


1.414 


-.500 


1 


. 5000 


1.000 


3 


.5000 


1 . 33? 


.750 


1 


.7500 


1.000 


6 


.5339 


1. 414 


.900 


7 


.4783 


1.402 


12 


.6590 


1.414 


.950 


1 4 


.4877 


1.426 


24 


.6608 


1 . 420 


. 975 


28 


.4922 


1.437 


48 


.66 17 


1 . 423 


.980 


28 


.5680 


1 .331 


96 


.4255 


1 .606 


.990 


56 


.5696 


1 .336 


192 


.4268 


1.608 


.995 


112 


.5704 


1 .338 


384 


.4274 


1 . 608 


.998 


336 


.5103 


1.425 


768 


.5457 


1.476 


.999 


672 


.5105 


1.425 


1536 


. 5458 


1 . 476 


Table I. Sample 


sizes. 


t ransf or m 


ed levels and 


va riance 



inflation factors for maximum transformation (v, a' and V') 
and next- to-maximum transformation (w, a" and V") stochastic 
approximation quantile estimation designs. 
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convergence may be substant 

When the optimum value A 

acts like steepest descent 

the steps are the same as 
to the distribution functi 

(Fabian [ 10 ]) . 



ially worse 


when 


A > 


2f (s ) . 
a 


= f(s ) is chosen, 
a 


the RM 


process 


approximation 


with 


small 


steps ; 


those for a 


linear approx 


imation 


on through 


the 


point ( 


s , a) 



a 



Evidently the initial choice of A has an imp 

influence on the efficiency of the basic RM process, 

general the magnitude of the effect cannot be dete 

since f (s ) is unknown. In fact, the asymptotic nor 
a 

of s stated by (15) cannot even be asserted since it 
n 

not be known whether A < 2f (s ) . For this reas 

a 

consider procedures which simultaneously estimate s 

f ( s ) and are thus more generally appl ica bl 
a ~ 



Practical application of stochastic approxi 
quantile estimation then requires that we have 

starting value s and an estimate of f (s ) . Although 

1 a 



is an improvement over order statistic estimators i 



speed and memory, the additional val 
stochastic approximation case in 
complexity. In fact the selection of 
critical to the feasibility of st 
quantile estimation and is one of 
addressed and solved in this thesis. 



ues required i 
troduce a degr 
these two valu 
ochastic approxi 
the main pr 



or tan t 
but in 
r mined 
ma I i t y 

will 
on , we 



_ and 
a 



mation 
both a 

there 

n both 

n t h e 
ee of 
es is 
mation 
oblems 
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D. Venter's Method and Confidence Intervals 



The first method for simultaneously estimating s and 

a 

f (s ) is due to Venter [37]. Note that although this solves 
a 

the problem of finding a suitable A value we must still 

select an initial estimate s ; this is not nearly as crucial 

1 

or as difficult as the choice of A. In Venter's method we 
observe two X values at each stage of the procedure and 
determine 



(20) Y' 

n 



and 



a if X > s + c 

2n-1 n n 



a if X < s + c 

2n-1 ~ n n 



(21) Y" = - a if X > s - c 

n 2 n n n 



1 - a ifX < s - c 
2n n n 



The sequence [c } is a sequence of positive constants called 

n 

the finite difference sequence; it must satisfy 



r 

(22) c n --> c, c> 0 , 0.25 < r < 0.50 . 

n 



A sequential estimator of f (s ) is then given by 

a 



(23) 



A 

n 



n 

1 . 2 , 
n g = 1 



Y’ - Y" 




3 
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Finally to estimate s we apply the basic RM recursion 

a 

relation (10) with 



(24) Y = (Y ' + Y") / 2 

n n n 

The latest estimate A of f (s ) is used in the gain sequence 

n a 

in the place of the arbitrary value A, i.e. we use the 
random value 1/(nA ) for a in (10). In a practical 



n n 

application of the method to quantile estimation, we 

accumulate only the sum in (23) thus obtaining nA ; this 

n 

quantity is used directly as the denominator of the gain 
sequence (1 1) . 



The chief practical difficulty encountered in using the 

estimator (23) is that A may become negative, in which case 

n 

the RK process will take steps in the wrong direction, or 

else A may get too large in which case the 0(n -i ) variance 
n 

will be lost. For this reason. Venter uses as an estimate 



of f(s ) in the gain sequence the value A*-, where 



a n 

$ * * 

(25) A = a if A < a 

n n 

* * 
A if a < A < b 

n n 

* * 

b if A > b 

n 



* * 

and where it is known a priori that a < f (s ) < b . As 

a 

* 

long as b is not too large, we have (Venter f 3 7 1 ) : 
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( 26 ) 



s --> s a . s . , 

n a 



(27) 



and 



A --> f (s ) a. s. , 
n a 



(28) 




s / 

a 




Thus, the Venter 
normal distribution 



estimator has the same asymptotically 
as the other stochastic approximation 



estimators we have considered. (Recall that s is based on 

n 



a total X sample of size 2n in this case.) 



The advantage of the Venter procedure is that we no 

longer need an independent initial estimate of f(s ) since 

a 

the procedure converges for any initial value of f(s ) in 

a 

* * 

the interval (a ,b ). We also obtain (asymptotically) the 
minimum possible variance and we have the additional 



estimate A which may be used to determine a confidence 
n 

interval on s . Sielken ([34] and [35]) has investigated 

a 

the application of the Venter process to the estimation of 
confidence intervals and stopping times. 
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* * 

The problem of finding the interval (a ,b ) was solved 



by Fabian [9]; he suggested the use of 



* -L 

(29) a =c 1 n ' 0 < L < 1/2, 

* 

b = C 2 log (n+ 1 ) , 

0 < C < C . 

1 2 



From a practical point of view, we may establish the lower 

bound by setting nA to some small positive constant 

n 

whenever the accumulated sum becomes negative. Venter's 



* 

results also indicate that the upper bound b may be 



arbitrarily large when the density function is analytic in 



some neighborhood of s , so that this does not represent a 

a 



restriction in many applications. 



E. A New Method 

A modification of the basic RM stochastic approximation 

process along the lines of Venter's work is the major 

contribution of this thesis. The new process is 

asymptotically equivalent to the other processes discussed 

in this Chapter but its finite sample properties seem to be 

much better. Just as in the case of the Venter process, we 

obtain an estimate of f (s ) which is pluqged recursively 

a 

back into the basic stochastic approximation relation; a 
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different technique for density estimation is employed, 
however . 



In seeking an estimate of an unknown density function 
at some point one is lead to the work of Rosenblatt [32] and 
Parzen [28] on kernel estimators. A kernel function W {•) is 
a bounded integrable function with 

(30) J W (x) dx = 1 

An example is the triangular weight function 

(31) W (x) = 1 - | x| if | x| < 1 

0 otherwise. 



The empirical density function estimator at the point x is 
then given by 

<e.' 

n ,-x — X -j 

( 32 ) f (x) = 1 .2 W j j J , 

n nb g = 1 *- 5 -» 

n n 

where [b } (called the "bandwidth" sequence) is a sequence 
n 

of positive constants which tends to zero with increasing n; 
for example. 



-1/3 

(33) b = b n , b > 0 

n 



We now define an estimator B of f (s ) using a kernel 

n a 

density estimator: 



(34) 



B = 



n 

1 . 2 , 

n g = 1 



- X 



■Cv-4 



and establish a new stochastic approximation 
uses the RM recursion formula (10) with B 



process which 
replacing A in 
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* 

the gain sequence (11). 

One advantage of the new density estimator (34) is that 
we are able to take twice as many steps as in Venter’s 
method for the same sample size; this seems to permit faster 
convergence in small samples. Some computational experience 
with the new estimators shows them to be far superior to any 
other known non-paramet ric technique for quantile 
estimation. Almost sure convergence and asymptotic 
normality for the new procedure are established in Chapter 
II. 



F. Scope of Research 



The goal of this 
application of the stoch 
described in this Chapter 
quantile estimation in the 
method which is fairly robu 
distribution F(*). The ch 
stochastic approximation 
procedure as well as the ba 
in some cases the estimator 

as many as 20,000 steps. I 



process 


(10) 


has the wo 


rst 


the imm 


edia te 


neighborh 


ood 


astrono 


m ical 


num ber 


of 


unf ortu 


nate 


tendency 


c 


approxi 


mation 


est imato 


rs 


applica 


tions. 







Encouraging results 
estimator proposed here, pa 
with the maximum transforma 



thesis is to investigate the 

astic approximation techniques 

to the problem of non- parametric 

hope of developing a practical 

st with respect to the underlying 

ief disadvantage in using any 

estimator - including Venter's 

sic RM process - seems to be that 

s are nowhere near s , even after 

a 

t is in this case that the RM 
convergence rate because reaching 
of the true value may require an 
additional steps. Unless this 
n be overcome, stochastic 
annot be recommended in practical 



have been achieved wit 
rticularly when it is 
tion technique and when 



h the new 
combined 
some care 
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is taken in 


selecting the 


starting value, s^. 


W hen 


an 


entire set 


of 


quantiles 


is to be estimated 


a furt 


her 


improvement 


is 


possible. 


Since the quantiles 


are 


by 


definition 


ordered, a gross error in a single est 


ima te 


can 



often be detected because the erroneous value is usually out 
of order with respect to the other estimates in the set. In 
this case alternate types of estimate can be used to replace 
the erroneous one, thus bypassing the lengthy path that the 
stochastic approximation process requires to reach the true 
quantile value. Assuming that only one or two of the set of 
estimates is in error, this approach should overcome the 
tendency of the stochastic approximation process to "blow 
up" . 



The thesis is organized as follows: in Chapter II, we 
establish the asymptotic properties of tne new estimator and 
show it to be equivalent to the Venter process as n ---> co . 
Chapter III describes some practical considerations relating 
to quantile estimation in finite samples of data using both 
order statistic and stochastic approximation estimators, 
while Chapter IV describes the results of an extensive 
digital computer simulation undertaken to determine the bias 
properties of the new estimator. Chapter V discusses the 
simultaneous estimation of an entire set of population 
quantiles and considers several techniques such as 
James-Stein estimation and isotonic regression to exploit 
the order relationships which are known to exist in such a 
set of estimates. Chapter VI discusses the estimation of 
functions of quantiles, in particular the estimation of the 
level of a test based on a given statistic and the 
estimation with the same simulation data of the power of the 
test. The last Chapter summarizes the work and discusses 
possible applications for the methods develop 1. 
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In summary, this thesis describes a method for 
estimating an entire set of quantiles with their 
corresponding densities for any statistic or other random 
quantity. The method is quite fast and uses a small fixed 
amount of memory; it is robust enough to be used as a basic 
building block in computer simulation programs. 



G. Limitations of Research 

In this thesis we deal only with non-para me trie 
quantile estimators; substantial improvements are often 
possible if we know enough about the underlying distribution 
function F (•) to apply maximum likelihood or other 

parametric estimates. For example, if F(®) is the 
exponential distribution then 

(35) s = -jl[ X] In (1 - a) 

a 

(where p[X] d enotes the sample mean) is the maximum 

likelihood estimator of s and is therefore asymptotically 

a 

fully efficient. Clearly, 



(36) E[s ] = - p In ( 1 - a) 

a 

= s , 
a 



so that s is unbiased; furthermore, 
a 



(37) Var[ s ] = 1 [ u In (1 - a) ]* 

a n 

= s 2 / n, 
a 
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which is at most 65 % 

non-parametric variance, 
relative efficiency of the 
becomes much greater. 

This work is also li 
continuous or partly con 
random variable X has a com 
a-guantile may not exist 
this difficulty we may r 
solution of 

(38) inf F (s ) > 

s a 

which reduces to (1) in t 
all clear, however, that 
reasonable interpretation, 
atoms. 



as large as the asymptotic 
As a approaches 0 or 1 the 
parametric estimator in this case 

mited to the consideration of 
tinuous distributions. When the 
pletely discrete distribution its 
or may not be unique; to overcome 
edefine the a- quantile as the 

a r 

he continuous case. It is not at 
the solution to (38) has any 
particularly if X has only a few 



The methods developed 
only pseudorandom simulatio 
proposed applications for 
can certainly be used but 
reasonable results from 
estimation are so large tha 
sufficient observations be 
the next-to- maximum transfo 
dealing with real data t 
the artificial samples used 
difficulty with outliers 
Lewis [12] point out the ma 
any problems caused by outl 

One final limitation o 



here have been investigated using 
n data and this is typical of the 
the techniques. Real world data 
the sample sizes required for 
stochastic approximation quantile 
t only in special cases will 
available. It seems likely that 
rmation will prove more useful in 



han was 


found to be 


the 


case with 


here s 


ince there is 


us 


ually more 


in the 


former case. 


As 


Gaver and 


ximura 


transform wi 


11 


intensify 


iers. 








f this 


work is that 


we 


consider 
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only samples with sequenti 
will clearly not be the cas 
many kinds of simulation 
our methods in the simulati 
techniques of Iglehart [ 
dependent observations is 
considered further here. 



al t observations; this 
e for much real world data or for 
studies. We may be able to apply 
on case by using the regenerative 
16 ] but the general problem of 
much more complex and is not 



31 



Chapter II. ASYMPTOTIC PROPERTIES OF THE NEW ESTIMATOR 



A. Definitions and Preliminaries 



We wish to estimate the solution x = s to 

a 

F (x) = a, 0 < a < 1, 

where F (•) is the distribution function of the random 
variable X. We assume: 

FI. F (x) has a derivative f(x) which is 

continuous in some neighborhood of s with 

a 

f (s ) = £ > 0. 

a 



F2. F" (x) exists and is bounded in some 

neighborhood of s . 

a 



Note that (FI) is sufficient for s to exist and be unique. 

a 



A sequential estimation scheme 

estimate of s at step n. The initial 
a 

arbitrarily (or at random with E[ § 2 ] 
recursion 



is used with s the 

n 



estimate s is chosen 

1 



< 00 ) and we apply the 



(1) s = s - a Y , 

n+ 1 n n n 



32 



No n-parametric Quantile Estimation Through 
Stochastic Approximation 



where Y is given by 



n 




(2) 


Y = -a if X > s , 

n n n 

= 1 - a if X < s 

n n 


In (2) X 

n 


is a random variable with distribution F («) which 


is assumed 


independent of (s ; X , ,X }. 

11 n- 1 



The gain sequence {a } is given by 

n 



t 

(3) 


a = 1 / nd , 
n n 



where d is essentially a "bounded" kernel density estimator 
n 

(see Rosenblatt [32] or Parzen [28]): 



(4) 


-L 

d = Max [ Z n , Min { B , C log(n+1) } ], 

n '1 n 2 


with 0 < L 


< 1/4 and 0 < C < C . The estimator B is 

1 2 n 


defined by 




(5) 


n 

B = 1 .2, w . ' 
n n g=1 g 


(6) 


r s - x *, 
w. = 1 W 13 i] 

3 3 


where [b } 
n 


is a bandwidth sequence of positive constants 


satisfying 




(7) 


-g 

b = 0 (n ) , 1/5 < g < 1/2 . 

n 
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The function W(*) is called the kernel function; it is 
assumed to satisfy 



W1 . 


W (u) > 


o, 


- 00 < u < 00 . 


W2 . 


sup 

-oc< u<oo 


W (U) 


= K < 00 . 


W3 . 


fee 

/ W (u) 

J- CC 


du 


= 1. 


W4 . 


1 im 

| U | ->oc 


| uW (u) | = 0. 


that 


W <•) is 


a 


probability 



assumptions . 



In what follows, we show first that s -> s almost 

n a 

surely (abbreviated a.s.) and that d -> 0 a.s.; then, using 

n 

a theorem of Fabian [9], we develop the asymptotic 

distribution of s . Throughout, {Q,S , p} will be a 

n 

probability space and 8=a(s;X,...,X ) C S a sequence 

n 1 1 n-1 

of a-fields (i.e., the smallest cr-field with respect to 
which the indicated variables are measurable). 

We begin by rewriting the basic relation (1) in the 

form 

(8) s = s - T + 0 , 

n+ 1 n n n 

in which we define 
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( 9 ) 



T = a [ F (§ ) - a], 

n n n 

U = -a Z , 
n n n 

2 = Y - E[ Y | B ] 

n n n n 

= Y - F ( s ) + a 

n n 



We note that jZ | < 1. 

n 



Since we will deal with sequences of the form (9) , we 
begin by stating two lemmas relating to sequences of this 
type. Proofs may be found in Loeve [24]. 



Lem ma 2 (Loeve) Let {V } be a sequence of random 

n 

00 CO 

with 2 Var[V ] < °o ; then if £ E[ V | V , . . . ,V ] 
n=1 n n=1 n 1 n-1 



00 



a . s . , 2 

n = 1 



V converges a.s. to a random variable, 
n 



variables 

converges 



n 



00 

Lemma 2 (Loeve) If c(n) ->« and 2 _ 1 Var[ V ] < oo then 

n=1 c^nT ? n 



1 

cJnT 





E[ V | V 
k 1 



V ]} — > 0 a.s. 

k-1 



n 



B. Convergence of s 

n 

The proofs in this Section follow the lines of Blum's 
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work [2], In fact, the convergence of s follows at once 

n 



from the bounds indicated by (4) (Fabian [9]) if we are 



willing to adopt a slightly different definition of B . Now 

n 

we deal with the relation (8) and show 



00 

Lemma 32 d converges a.s. to a random variable. 
n=1 n 



Proof : 

Clearly , 

Var[ U ] < E[ a 2 Z 2 ] 
n n n 

<1 E[ Z 2 / d 2 ] 
n n 

2-2L 

< 1 / (n C 2 ) , 



00 

so that 2 Var[U ] < °° . 
n = 1 n 



Now X is independent of {§ ; X ,...,X } and since 

n 1 1 n-1 



these random variables uniquely determine d we have 

n- 1 



E[Z /d | B ] = o a.s. Thus, 
n n-1 n 



E( U j 8 ] = S[ -a Z | 8 ] + _1 E[ Z /d | 8 ] 

n n n n n n-T n n-1 n 



= E[ {1/ (n-1) d - 1/nd } Z | 8 ] 
n-1 n n n 



= 1 E[ { (n- 1) d - nd } Z /d d j 8 ] . 

nJn-1Y n-1 n n n- 1 n n 



Now we use the definition (4) of d to set an upper bound: 

n 
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|E[ r J I B ]| < fc2 n L (n- 1 ) L ] 

n n nTn-TT *- 1 -» 



n7n-Tf 

• E[ | nd - (n-1) d | | Z | ! B ] 

n n-1 n n 

L- 1 L-1 -2 

< n (n-1) C E[ | nd - (n-1)d | | B 3 

1 n n-1 n 

where we have used the fact that |Z | < 1. The relationship 



| Max[a,b] - Max[c,d] | < Max[ |a - c|, |b - d| ] 
and the definition (4) then imply that 

1-L 1-L 

| nd - (n- 1) d j < Max ( |C n - C (n-1) |, 

n n-1 1 1 

InB - (n-1) B |, 
n n-1 

|C n log(n+1) - C (n-1) log nj }. 
2 2 

-L -L-1 

Now the first term here approaches C (1-L) n + 0(n ) as 



n --> co so in this case we have 



L-1 L-1 -2 -L -L-1 

f EC U | 8 ]l * n (n-1) C [C (1-L) n + 0(n )J 

n n 11 

L-2 

= 0 (n ) a . s . 



For the last term we get 



C n log(n+1) - C (n-l)log n = C log(n + 1) + C (n- 1 ) log (1 + 1) 

2 2 2 2 n 

< C^log (n+ 1 ) 



so that 

L-1 L-1 -2 

|E[U | B ]| < n (n-1) C C log (n + 1) 
n n 12 
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2L-2 

= 0 (n log n) . 



Finally we consider 



| nE - (n- 1) B I = w 
n n- 1 n 

< K / b 

n 

g 

= 0(n ) , 



in view of ( W 2) and (7) . Thus we conclude for this case 
that 



_ L-1 L-1 -2 

| E[ 0 | 8 ]| < n (n-1) C K / b 

n n 1 n 

2L+g-2 

= 0 (n ) . 



00 V. 

We thus have that 2 |E[(J | 8 ]| converges almost surely in 

n= 1 n n 

all three cases because of the definitions of L (4) and g 
(7) . An application of Lemma 1 then completes the proof. 

□ 



Lemma 4 (Blum) s converges a.s. to a random variable. 

n 



Proof : 



Iterating (8) back to s^ yields 



n n 

s = s-2r + 2u, 

n+1 1 j- 1 j j=1 j 



so that 



( 10 ) 



n n 

s + 2T=s + 2U converges a.s, 
n+1 j=1 j 1 j=1 j 



in view of Lemma 3. Next we show 
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(11) P r { 1 i m s = oo ) = o . 

n-> n 



Suppose, for example, there exists a sample sequence (s } 

n 



with lira s = oo ; then s < s for only finitely many n so 
n-> co n n a 



that T = a [ F (s ) - a] > 0 when n is large enough. Thus 

n n n 



lim [s + 2 T ] -~>co which occurs with probability 

n->oo n+1 j=1 j 



zero by (10). This establishes (11) and we similarly show 



( 12 ) 



Pr { lira s = 
n-> oo n 



- CO } = 0 . 



Now suppose the lemma is false; then there must exist 
sample sequences for which 



(13) 



_ n 

s + 2 I converges to a finite number 

n+1 j=1 j 



lira inf s < lim sup s 
n->°° n n-> 00 n 



Letting {s } be such'a sequence, we assume that lim sup s > 
n n 

s (a similar argument handles the case lim sup s < s for 
a n a 

then lim inf s < s by (13) ) . We then choose numbers c 

n a 



and d such that c > s and lim inf s < c < d < lim sup s . 

an n 

n 

In view of (5)- (7), a -> 0; and since s + 2 r 

n n+1 j=1 j 
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converges, we may choose N so that N < n < m implies 



a < d - c , 
n “2“ 



(14) 



m- 1 

| s - s + 2 T | < d_-_c . 

m n j=n j 2 



Now we select m and n with N < n < m such that 



s < c, 

n 



s > d, 
m 



c < s < d for n < j < m . 

j 

We may clearly do this. Thus, 




m-1 

(16) s - s < d_-_c - 2 T < d_-_c - T , 

m n 2 j=n j 2 n 



since T = a[F(s)-a]>0 for s > c > s . Now if 
3 3 3 j a 



s we obtain 
a 



s - s < d - c , 
ra n ~~7 



in contradiction of (15) which implies s - s > a - 

m n 



s < s we have 
n a 



-T =a[a-F(s)]<a < d_-_c 

n n n n 2 



from (14) ; thus (16) becomes s - s < d - c, which 

m n 



s > 
n 



c. If 



again 
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contradicts (15) . This means no sequence (s } can satisfy 

n 

(13), thus establishing the lemma in view of (11) and (12). 

n 



Theorem _1 (Blum) s --> s a.s. 

n a 



Proof : 



We suppose Pr { lim s = S) = 1 as guaranteed by Lemma 4 and 

n-> oo n 

we also suppose that Pr{S ¥= s } >0. Now we choose c and d 

a 

with s < c < d < cc and Pr {c < S < d} >0. (Alternatively 
a 



we take -oo < c < d < s .) Then for every sample sequence 

a 



(s } for which lim s = S, c < S < d, we have c < s < d 
n n->“ n n 



for almost all n. Lemma 3 and Lemma 4 show that 



n n 

(17) E T = T a .[ F(s .). - a] converges ; 

3=1 3 3=1 3 3 ' 

however, F (s ) - a > F(c) - a > 0 for almost all j so . (17) 

3 

must diverge because a > {j C log ( j + 1 ) } ~ 1 ; this follows 

3 2 

from the definitions (3) and (4) and the fact that C > C . 

2 1 

Thus, 



JL a. > .2 (Cl log ( j+ 1 ) } -1 = 0[ log (log n) ] 

3=1 3 3=1 1 
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This contradiction establishes the theorem. 

□ 



C. Convergence of d 

n 

We begin by proving three preliminary Lemmas. 

Lemma 5 Let (t (x) } be a sequence of measurable functions 
n 

uniformly continuous for every n > N in some neighborhood of 
the point X € R with 

(18) lim t (X) = t (X) ' 

n-> oo n 

and {X } a sequence of random variables with 
n 

(19) X — > X a . s . , 

n 

where X € R is a constant. Then 

(20) t (X ) — > t (X) a. s. 

n n 

Proof : 

The convergence (18) implies that for each > 0 , whenever 

n > N ( n ) we have 

1 

(21) |t (X) - t (X) | < *7/2. 

n 

The uniform continuity of t (X) for n > N likewise implies 

n 

that given >7 > 0 there exists an e > 0 depending only on f) 
such that 
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(22) |X (co) - X| < e ==> |t (X ( ) ) - t (X) | < rj / 2/ 

n n n n 

for each co 6 Q . Combining (21) and (22) yields 

(23) |X ( co) - X| < e = = > |t (X (co)) - t (X) | < rj . 

n n n 

Now by Egoroff's Theorem (19) implies that for each 5 > 0 

there exists a set A-.cS with P (A-.) >1-5 such that 

o o 

X (co) converges uniformly in co for every co in A... 
n o 

Evidently then if n > N (e) , 

2 

e A-. ==> |X ( co ) - X| < e. 
o n 

Now since e in (23) depends only on >? , whenever n > N ( 77 ) = 

max [ N ( r\ ) , N^ (e) ] we have 

6> € A = = = > It (X (co)) - t (X) | < rj , 
o n n 

which means that t (X ) -> t(X) uniformly on A-. Since 5 

n n o 

is arbitrary, this means that t (X ) -> t(X) almost 

n n 

uniformly which implies (20) because of the equivalence of 

almost sure and almost uniform convergence (see Lukacs 
[25]) . 

□ 

\ 

Lemma 6 Let {X } be a sequence of bounded random variables 
n 

with 

(2H) X --> 0 a. s . , 

n 
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S = 



1 .2 x 

n g = 1 3 



Then S — > 0 a.s. 
n 



Proof : 

Because of (24) , given e > 0 there exists a set A c S 

e 

P(A ) = 1 such that 

e 



6 A = = > |X ( co ) | < e/2 
e n 



for all n > N(e,o>). Now for t > 0, 



(25) 



IS (co) I = 
N + t 



N+t 

i z x ( oj ) 

N+t 3=1 3 



I N 

s 1 .2 X .(co) 

N+t I 3=1 3 



| N+t 

+ 1 z X ( eu ) 

N+t j=N+ 1 j 



Now we take 



(26) 



C (N ,CO) = sup | X (co ) | < oo 

n< N n 



this follows from the hypothesis that (X } is bounded, 

n 

the lemma will hold for any sequence satisfying (26). 
(25) becomes 

|S (co ) | < N C (N, to ) + t e 

N+t N+t N+t 2 

< e + e = e 
2 2 

whenever we choose t > T (e,co ) . Thus, 



co e A ==> |S (co ) | < e 
e in 



with 



but 

Now 
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for all m > M(e,o> ) = N + T. Since P (A ) = 1, we conclude 

e 

that S -> 0 a. s. 
n 

□ 



Lemma 7 Under assumptions (FI) and (Wl) through ( W 4) the 
function 



(27) t (x) = 1 / W fx - x] dF(y) 

n b / «- 5 J 

n ^ “ n 



is uniformly continuous in some neighborhood of x = s for 

a 

every n > N. 



Proof : 

Suppose in accordance with (FI) that the density f (x) exists 

and is continuous for x*«I = [s-A,s+A] for some 

a a 

A >0. Following Parzen [28] we may rewrite (27) in the 



form 



t (x) - f(x) 

n 



where x 6 I and 
x € I, 

It (X) -f (X) I 

n 



L 



I Y I ^8 



[f(x-y) - f (x) 3 b-i W(yb-i) 



n n 



/. 



I y I >5 E 



"[ ] dF <y 



/ 



- f (x) J b~ 1 W (yb-i) dy 

I y I >6 n n 

8 is chosen such that 0 < 8< A. Thus 



— sup |f (x-y)-f(x) | / W (u) du 

lyl^S Hyl^S 



/. 



dy 



when 
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+ 



/. 



lyi >8 



4 1 a C l ] 

n n 



1 dF(x-y) 

TyT 



♦ f(x) 



/. 



I y I >5 



b-i 

n 



w ( y b - 1 ) 
n 



dy 



Jt (x)-f(x)| < sup I f (x-y) - f(x)| 
n I y I <5 



♦ i/ 

r •' 7 . > 



. sup I Z W (Z) I 

8 J z>8 ?b 

n 



/ 



dF (z) 



+ f(x) 



/ 



|z|> 8 /b 



V. (z) dz . 



Now given some e > 0 we may, by the continuity of f (x) on I, 
choose a 5 > 0 such that the first term will be less than 
e/3. Having chosen 8 we may then select N such that when 
n > N (W4) implies that the second term will also be less 
than e/3. Finally, ( W 3 ) allows us to conclude that the last 
term will also be less than e/3 when n is large enough. He 
thus have that 



sup 1 1 (x) - f (x) | < e 
x G I n 



when n > N(e) , i.e. t (x) is uniformly continuous on I. 

n 

□ 



Theorem 2 d --> P a.s. 
n 

Proof : 

In view of the bounds (4) it suffices to show that 

(28) B — > p a.s. 

n 

We first note that 
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- r y - x 

w (y) = 1 w_s 

n d L d J 



has a bounded variance whose bound is independent of y; 

dF (u) 



Var[ w (y) ] < 1 
n E* 

n 



f vz t-i-a] 

J n 

f a 



< K2 I dF (u) = K2 , 

B * J E? 

n n 



which follows from (W2) . Thus, 



00 



2 1 Var[ w ] < K2 2 (n b )-2 

n=1 n 7 n n=1 n 

which is finite by (7) . Lemma 2 with c = n then implies 

n 



(29) 



1 2 {w - E[V | 8 ]} — > 0 a.s. 

n 3 = 1 j j 3 



Now 



(30) 



n 

B = 1 .2 « . 

n n g=1 j 

n 

= 1 2, C». 

n g = 1 3 



E[w J 8 .]} +1 .2 E[w J B. ] 

3 3 n 3=1 3 3 



3 3 



= e[ 


r s - X 

l- ' 

j j 


N 


= J- 




dF (y) 


j 


%/ ^ , 

3 




= t 

j 


(s ) a.s., 

j 
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with t (•) given by (27). Now Parzen [28] has shown that 
(W1) - ( W4) and (FI) imply 

lim t (s ) = f (s ) = 0 . 

n-> co n. a a 

Clearly t (•) is measurable and t (s ) is continuous for 
n n a 

every n greater than some fixed N by Lemma 7 so we may apply 

Lemma 5 to assert 

E[ w |8 ]-->£ a.s. 

3 j 

Now by ( W2) / | E[ w | 8 ] | < K / b < co so that (26) is 

j j j 

satisfied for X = E[w|8 ] - 0 ar>d an application of 

j j j 

Lemma 6 and (29) to the right-hand side of (30) establishes 

(28) . 

□ 



D. Asymptotic Normality 

We first state a Lemma due to Burkholder (see [3] for a 
proof) and then use it to obtain a result on the convergence 

of s in the quadratic mean, 
n 

Lemma 8 (Burkholder) Let {X } be a non-negative sequence 

n 

of real numbers and {q } , [r } real number sequences with 

n n 

lim inf q = g > p > 0 and lim sup r = r > 0 such that for 
n n 

every n > N 
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Then 



q p+i 

X < {1 - _n) X + r / n . 
n+1 n~ n n 



-p -p 

X < r_ n + o (n ) . 

n g - p 



Lemma 9 E[ (s - s ) 2 ] = 0 (n - 1 ) 

n a 



Proo 



■c . 



In what follows we write s* 

n 



= s - s . Expanding (8) , we 
n a 



obtain 



(31) s* = s* - a [ F (s ) - a + Z ]. 

n+1 n n n n 



If we expand F(s ) in a Taylor's series about s we then get 

n a 



(32) F (s ) - a = F (s ) + (s - s ) f (s } 

n a n a a 

+ 5 (s - s ) - a 

n a 

= /9s* + S(s*) , 

n n 

where §(x) = o(x) as x-> 0 because of (F2) . Ke write 5 

n 

for S (s*) in what follows. Substituting (32) into (31), 
n 

squaring and simplifying yields 



(33) 



2 2 

s* = (1 - 2a /9 ) s* - 2a fl s* ( Z + § ) 

n+1 n n nnn n 
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+ a 2 ( A s* + 5 + Z ) 2 

n n n n 



In order to apply Lemma 8 to (33) we define 



g = 2nay9(1 + Z /s* + § /s*) 

n n n n n n 



--> 2nA_ (1 +o( 1 ) ) a.s. 
n A 



— > 2 a.s.. 



where lim g /s* — > 0 by the definition of 5(*). We 
n ->co n n 

also take 



r = n 2 a 2 (As* +5 + Z ) 2 

n n n n n 



= n 2 a 2 [ F (s ) - a + Z ] 2 
n n n 



— > Z 2 / A 2 a.s. 
n 



— > a(1 - a) / A 2 a.s. 



We then rewrite (33) as 



2 g 2 r 

s* = (1 - . n) s* + n ; 

n+1 n n n 7 



2 

an application of Lemma 8 with c=2, p=1 then shows that s* 

n 

= 0(n -1 ) a.s. and so we conclude 



E[ (s - s ) 2 ] = 0 (n - 1 ) . 

n a 

n 



We now state a specialization of a theorem due to 
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Fabian [9] to show the asymptotic normality of s . The 

n 

notation I stands for the indicator function of the set 

{t} 



{t} , i.e. 



I (x) = 1 x € {t} 

} 

= 0 x fL {t } 

Lemma 1_C (Fabian) Let B be a non-decreasing sequence of 

n 

o-fields, 8 c 8. Let A , B , V ,0 and T be 

n n n-1 n-1 n-1 n-1 

B -measurable random variables with 
n 



A — > a a. s . , 
n 

B — > b a. s . , 
n 

T — > t a. s. or E[ (T - t) 2 ] — > 0, 
n n 

with a,b,t € R. V satisfies 

n 



(34) 



E[ V | B ] = 0 a . s . , 
n n 

C > E[ V 2 - a 2 1 B ] — > 0 a.s., 
n n 



1 1 E[I (V2) | B ] 

n 3=1 {V 2 > ne} n n 

n 



— > 0 a.s.. 



for every e > 0, while U is defined by 

n 



r A -l -3/2 

U = 1 - _n U + 1 B V + n T. 

n+1 L n -> n n n n n 



Then 



1/2 L 

n U — -> N [ t/(a - 1/2) , o 2 b2 ] . 
n 
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□ 



Theorem 3s is asymptotically normal with mean s and 
n a 



variance a(1-a)/n£ 2 . 



Proof : 

To apply Fabian's theorem we use the Taylor's series 



expansion of F (s ) ; putting (32) into (8) and simplifying we 

n 

get 



s* = (1 - a j8 ) s* - a Z - a 5 
n+1 n n nn nn 



Now we can take 



A = /5/d --> 1 a . s. , 

n n 

B = -n a --> - /9 ~ 1 , a.s. 
n n 

3/2 

T = n a & 
n n n 

E[T 2 ] = n E[ S 2 / d 2 ] 
n n n 

--> 0 , 



-1 

since 5 2 = o (n ) by Lemma 9. Furthermore, we have 
n 



E[ 2 | B ] = 0 a.s. , 
n n 



E[ Z 2 | B ] = F (s ) [ 1 - F (s ) ] 
n n n n 

— > a (1 - a) a.s.. 



while the convergence of (34) follows at once from the fact 

that Z is bounded. Thus we conclude from Lemma 10 
n 
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( 35 ) 



1/2 L 

n (s - s ) > N [ 0, a(1-a)j9 -2 ]. 

n a 



To show that d also has an asymptotically normal 

n 

distribution we need a Central Limit Theorem for the sum of 

a sequence of dependent summands. For a proof, see Loeve 
[24], p. 377, Theorem C. 



Lemma 1 M (Loeve) 



Let {X } be a sequence of random 
n 



variables with S =1 2 x • If 

n n 3=1 j 



(i) E[ X | X , ... , X ] = 0 a. s. , 

n 1 n -1 

(ii) Var[ S ] = 1 J E[X 2 ] = a 2 < « , 

n n ? g = 1 3 n 

(iii) 1 1 e[ |E{X2|X ,...,X. } - E { X 2 } | ] — > 0, 

n* 3=1 3 1 3-1 3 

and (iv) for each e > 0 , 



1 Z E[I (X 2 ) ] --> 0, 

n? 3=1 { I X |>e} k 

k 

then S has an asymptotically normal distribution with mean 
n 

0 and variance a 2 . 

n 



Theorem 4 d has an asymptotically normal distribution 

n 

g-i 



with mean >9 and variance Q (n ) . 



Proof : 
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From (30) we have 

B = 1 2 (w - E[w (8 ]) + 1 § E[w I B ], 

n n j = 1 j j j n j=1 j j 

where the second term converges a.s. to ^ . In order to 
apply Lemma 11 to the first term we define 

v = w - Ef w | 8 ] . 

k k k k 

Clearly , 



(36) 



E[v | 8 ] 
k k 

E[v2, 8 ] 
k k 



= 0 . 



E[ (w - E[ w | 8 ]) 2 , 8 ] 
k k k k 



E[w2| 8 ] - 2 E[v t (s ) | 8 j 
k k k k k k 

+ E[t2(s ) | 8 ] f 
k k k 



where we have used the fact that 



E[ w 1 8 ] = t (s ) a.s. 

k k k k 



from Theorem 2. Also 



E[w2|8 ] = 1 

k k 



k 

k ( k 



C' i 


r s - y a 


/ w2 


k 1 


~ 00 1 


L —5 — J 


k 


. ) . 





Simplifying (36) then yields 



E[vf | B ] = T (s ) - 1 2 ( S ) 
k k k k k k 



= B (s ) . 
~ k k 



Now Parzen [20] has shown that 
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/“ 



lim 9 (s ) = b~ 1 f (s ) / W 2 (u) du; 

n-> oo n a n a 

we note that Jv 2 (u) du is finite by (W2) and (W3> but that 

the limit diverges because of the definition of b (7) . The 

n 

proof of Lemma 7 may be extended at once to show that 0 (s ) 

n a 

is continuous (at least for all n greater than some fixed N) 
so an application of Lemma 5 shows that 



E[ v 2 | 8 ] --> b~i f (s ) / W 2 (u) du a . s. 

k k k a 



/ 



Now we conclude 



E[vf ] = E[ E[ v 2 1 8 ] ] 

k k k 



— > b~ 1 f (s ) / W 2 (u) du, 

k a 



f' 



so that 



n / n 

1_ 2 E " v 2 ] — > f (s ) / W 2 (u) du 7 n- 2 b“ x 

n? j=1 j ay j=1 j 



-g 

The summation clearly converges; if in fact b = b n , 

n 



1 < g < j, an application of Euler's summation formula shows 



a 2 
n 



f (s ) f W 2 (u) du n g 

— > a J 7 1 

Ffi 2 ’ j = 1 



— > 



f (s ) f W 2 (u) du r g-1 g-2 

a J n + 0 (n ) I . 

B7T+g y L J 
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la this Chapter we describe some methodological 
considerations in quantile estimation using both order 
statistic and stochastic approximation estimators. The 
emphasis throughout is on practical application of the 
techniques in finite samples of data rather than on the 
asymptotic theory of the first two Chapters. 

It has long been known that the finite sample behavior 
of the basic stochastic approximation quantile estimators is 
seriously flawed from a practical viewpoint (Cochran and 
Davis [4]; Wetherill [36]; and Davis [6]). Since the 
problem of finite sample analysis of stochastic 
approximation estimators is analytically intractable we rely 
for the most part on digital simulation to examine the 
finite sample properties of our new estimator; it will be 
seen that most of the drawbacks have been overcome. 

The asymptotic distributions asserted by (1.6) and 
(1.7) f or order statistic estimators and by (1.15) , (1.28) 
and (2.35) for the various stochastic approximation 
estimators may fail to describe the actual distribution of 
the estimator for some given n either because this actual 
distribution is markedly non-Gaussian in shape or because 
its mean and variance deviate appreciably from the 
theoretical values. In this Chapter we are for the most 
part concerned with the first difficulty, leaving the 
discussion of estimator bias and mean squared error for 
Chapter IV. 
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A. Order Statistic Estimators 



1. Basic considerations 



As 


pointed . out 


in 


quantile 


, . , A 

estimator s 


for 



n 



Chapter I, the order statistic 
the a-quantile is given by 



with 

case. 



u = 
here 



A 




[ a(n+1) ]. Unlike the stochastic approximation 

we need not rely on the asymptotic normality of 



s to obtain a confidence interval on s ; non- parametric 
n a 



confidence intervals may 



be constructed from the 



relationship (David [5]) 



(D 



Pr { X 

(t) 



< s < X } 
a (v) 



v-1 i n-i 

. a (1 ' a) 
i=t tij 



This formula may be evaluated using a table of the 
incomplete Beta function (see, for example, Kendall and 
Stuart [17 ])>; however, direct use of the relation (1) is 
impractical and unnecessary for choosing the values of t and 
v for large sample sizes n since suitable values for given n 
and a may be obtained by using the normal approximation to 
the binomial random variable. For a 100 p % confidence 
interval we have 



t ~ a(n+1) - vAr^T-ayn u 

P 
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v = a(n+1) + v/'alT-aJ’n u , 

P 

where u is the upper 1 - 1 p significance point of a unit 
P 2 

normal variate. To obtain a conservative interval, we round 
t down and v up to the nearest integer. 

The quantile estimation problem may then be reduced to 

finding three order statistics X , X and X ; this 

(t) (u) (v) 

does not require that the entire X sample be sorted nor need 

we save the entire sample. In fact, just a bit more than 
a n sample values (or (1-a)n values for a > 0.5) must be 
stored. The three order statistics may then be found by 
applying Floyd and Rivest's SELECT algorithm [11] which 
requires an average amount of work proportional to n. This 
then represents a substantial computational advantage over 
the naive method of sorting the entire sample, as well as 
decreasing the memory requirements somewhat. 

There remain, however, several serious shortcomings to 
the order statistic method. First, if more than just a 
single quantile must be estimated the memory requirements 
will probably increase drastically and the amount of work 
also increases quickly. For the simultaneous estimation of 
the 19 quantiles of Table I it will still be necessary to 
store the entire sample and the work needed to find the 57 
order statistics of interest will be comparable to the 
effort required to sort the sample as a whole. 

This may be shown to be the case by considering that 
the number of comparisons between observation values is a 
rough measure of the total amount of work required to sort a 
sample (or to find the order statistics of interest) . The 
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SELECT algorithm [11] requires an average of about 

n[ 1 + min(a,1-a) ] comparisons to find X so that finding 

(u) 

the median requires about n/2 comparisons. Once the median 

is found, the upper sample guartile (i.e., 0.75 order 

statistic) must be found in a set of data which is only half 
as large as the original sample (this is a result of the 
sorting method employed) ; this requires n/8 comparisons, on 
the average. Proceeding in this manner, we find that 
determining all 57 order statistics will take about 15 n 
comparisons; a complete sort, on the other hand uses about 2 
n In n comparisons (see Knuth [19]). The advantage will 
then be with the complete sort for values of n less than 
1500 and the amount of work will be about the same for 
1500 < n < 10,000. 

Since order statistic estimation is not basically a 
sequential scheme, a second shortcoming of order statistic 
estimation arises when it is found that a larger sample is 
needed, perhaps because the estimates in a sample of size n 
are not precise enough or perhaps because more data become 
available. If one wishes to take advantage of the savings 
possible in storing only a n of the observations one must 
fix the value of n in advance. When a larger sample is to 
be investigated it will not in general be possible to find 
the exact order statistic of interest in the pooled sample 
unless all of the discarded data from the original sample 
can also be reviewed. Furthermore, the operation of the 
SELECT algorithm will still require an amount of time 
proportional to the new (larger) sample size. 
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2. Decreasing the storage - Payne's method 



The most serious difficulty with order statistic 
estimators is the inescapable linear growth in storage 
requirements with sample size. For this reason, a technique 
due to Payne [29] may be considered. A value m < n is first 



chosen; Payne shows that m may be proportional to ./n. 



An 



array of size m is set aside and filled with the first m 

observations on X. The array is sorted and then, using (1) , 

a confidence interval on s is obtained. Observations 

a 

outside the confidence interval are discarded and new 

observations are obtained to fill the array. Any 

observation which does not fall within the confidence 

interval is counted toward the total number of observations 
but is not put into the array. When the array is again 
filled it is sorted in place and a new, narrower confidence 
interval is chosen. (The new interval is narrower in the 
sense that it is shorter than the earlier one, but it will 
have the same probability mass from (1) since it is based on 
a larger sample. Note that it will in general have more 
observations than the earlier interval.) The procedure is 
repeated until all the observations have been examined. 



The main drawback to Payne's method is that if the 
initial confidence interval is not wide enough the technique 
nay fail to cover the required order statistic when the 
entire sample has been examined. For this reason, the 
technique should probably be employed with extremely 
conservative confidence intervals - say 4-5 standard 
deviations - with the actual desired confidence interval 



60 



Non-parametr ic Quantile Estimation Through 
Stochastic Approximation 

chosen at the final step. For example, to determine the 
median of a sample of 10 6 observations with very low 
probability of failure a total storage requirement of some 
8000 observations should be ample. 

The estimation of several quantiles by this so-called 
partial sorting method appears to involve a fairly complex 
algorithm, but the method should be useful for a small 
number of quantiles (say two or three) in fairly large 
samples of data. Although the method still requires memory 
which increases with sample size, the presence of more 
observations can often be handled by simply decreasing the 
coverage of the last confidence interval. 



3. Approximate order statistics - Averaging 

Another possible application of the order s+atistic 
method is to consider the X sample in sections of some fixed 

size, say 100 observations. We can then choose Y = X 

i (100a) 

in section i. The final estimate could then be the average 

of the Y 's or we may once again sort the Y sample and 
i 

choose an appropriate order statistic as an estimate. If 

the second technique is adopted one may obtain yet another 

level of sections of the Y order statistics and then choose 

Z = Y ; we call this a "nested" method. Both the 

i (100a 1 ) 

average and nested methods can be thought of as approximate 

order statistic methods since they do not find the actual 
order statistic in the entire sample but rather a value 
close to it. 

The chief drawback to the averaging method is that 
there may be appreciable bias in the Y values if these are 
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drawn from samples small enough to be practical; Table II 
indicates some results for extreme quantiles from several 

Quantile Bias for Sample of 



Distribution 


Alpha 


Value 


100 


1000 10000 


Exponen tial 


0.5 


0.6931 


-.0050 


-.0005 -5X10-5 • 




0.9 


2.3026 


- .0442 


-.0045 -.0004 




0.99 


4.6052 


-.4175 


-.0487 -.0049 




0.999 


6.9077 




-.4223 -.0491 


Normal 


0.5 


0.0 


-.0125 


-.0013 -.0001 




0.9 


1.2316 


-.0320 


-.0033 -.0003 




0.99 


2.3263 


-.1782 


-.0206 -.0021 




0.999 


3.0902 




-.1361 -.0158 


Uniform 


0.5 


0.5000 


-.0045 


-.0006 -5X10-5 




0.9 


0.9000 


-.0089 


-.0010 -.0001 




0.99 


0.9900 


-.0198 


-.0020 -.0002 




0.999 


0.9990 




-.0010 -.0001 


Cauchy 


0.5 


0.0 


-.0159 


-.0015 -.0002 




0.9 


3.0777 


-.0098 


-.0010 -.0001 




0.99 


31.820 


-.0103 


-.0010 -.0001 




0. 999 


318.31 




-.0010 -.0001 


Table II. 


Bias of 


the order 


statistic quantile estimator 


for various 


d istribut 


ions. Note 


that 


these biases are for 


single order 


statistics; unbiased esti 


mates of the median in 


the normal. 


uniform and Cauchy 


cases 


may be obtained by 


taking the 


usual sample median. 


Biases were evaluated 


analytically 


for the 


exponential 


and 


uniform distributions 


and by Ga uss-Legendre quadrature far 


the normal and Cauchy 



distributions. 



'i 
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common distributions. Th 
estimator will converge to 
larger samples are obtains 
is objectionable or not dep 
of the total sample size 
seem preferable to adopt an 
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s that the 
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otic error 
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certainly 
d scheme. 



It 


should 


be 


po 


inted 


out 


that 


for the expon 


ential 


distribution the 


bias 


is 


about 


10 % 


of the true 


0.99 


quantile 


value 


for 


a 


sample of 


100 observations and 


about 



1 % when 1000 observations are considered. The normal 
distribution has similarly poor properties so that quite 
large sections may be required in these cases if bias is not 
to be a problem in the final approximate order statistic 
estimate . 



Usually bias can be removed by using the 
technique (see Hiller [27]) but since the order s 
are very non-linear functions of the observat 
jackknifing eliminates bias only at the cost of a 
inflation of the variance. This inflation was fo 
very bad for small samples by Goodman, Lewis and 
[14], where empirical evidence demonstrated that 
square error of the jackknifed estimators was 50 
than for the ordinary order statistic method for s 
from 1000 to 10,000 observations. Moreover, imple 
of a jackknife scheme is complicated by the requi 
sort not only the entire section but also a 
subsections . 



jackknife 
ta tistics 
ions the 
serious 
ur.d to be 
Robbins 
the mean 
% larger 
acples of 
mentation 
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4. Approximate order statistics - Nesting 

If we use sections of size n in an approximate order 
statistic method and then choose 
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Y = X 

i (u) 



with u = [a(n + 1) ], then s has the same value as the 

a 

a -quantile of the Y values, where 
Y 



a 

Y 



Pr{ Y < s } 
a 



Pr [ X < s } 
(u) a 



n 

.2 

1 = U 




i 

a (1 - 



n-i 

a) 



This is just a generalization of the two transformation 

methods of Section I.C. For a nested scheme, then, we 

accumulate a sample of n Y observations and choose 

Y 



Z = Y 
i (U y ) 



with u = [a (n +1) ]. The extension of this technique to 
Y Y Y 

higher levels of nesting is straightforward. 



The price we must pay for this reduction in the storage 
requirements is an inflation of the asymptotic variance just 
as in the case of the maximum and next-to-maximum 
transforms; note that the averaging method involves no such 
inflation as long as the X sections are large enough for the 
asymptotic variance (1.5) to hold approximately. If Z 

i 

were taken directly as an order statistic from an X sample 

of n n observations we would have 
Y 



Var[ Z ] 
i 

with the nesting scheme 



aj[1 - a). 

I n! ? ]s F 
Y a 



however. 
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Var[ Z ] = 

i 



where 



j-n i u-1 n-u 

f (s ) = «- u J ua (1 - a) f ( s ) . 

Y a a 

(See David [5].) Thus, the variance will be inflated by an 
approximate factor of 

n a (1 - a ) 

Y Y 



riii 2 2 2u- 1 2(n-u)+1 

LuJ u a (1 - a) 

For example, if we estimate the 0.99 quantile by 

considering a Y sample generated by taking the 99th order 

statistic in X sections of 100 the variance of an estimate 

based on a Y order statistic will be 1.437 times the 

variance of an estimate taken from the X sample as a whole. 

Since a = 0.73576 in this case, we may continue the nesting 
Y 

process by choosing n = 100 in which case we take 

Y 

Z = Y : the variance will then be further inflated by a 

i (74) 

factor of 1.566 for an overall inflation of 2.242. We may 
obtain results with the same precision by considering a 
larger sample (assuming data is available) ; in the present 
case, we need a total X sample of 14,400 to obtain a 
variance equivalent to n = 10,000 in an untransformed case. 
The total storage requirements, however, are now yust 244 
observations - 100 for the X samples and 144 for the Y 
sample. Similarly, we may deal with a total X sample of 
2,250,000 by using a triply nested scheme with 100 X, 100 Y 
and 225 Z observations, thus obtaining a variance equivalent 
to n = 1,000,000 in the unnested case. 



a (1 - a 
Y Y 

n-f^Ts ~T~ 

Y Y a 
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The nested order statistic scheme results in the 
smallest asymptotic memory requirements - 115 In n for 
repeated sections of 100 - but the increase in variance by a 
factor of about 1.5 per level is a very serious drawback. 
There is also the problem of determining the proper sample 
sizes and order statistics at each level - a problem which 
is most easily solved if the sample size can be specified in 
advance. The determination of the bias of the nested 
estimators and investigation of some reasonable way for 
finding confidence intervals are areas for further research, 
but the problem of variance inflation would seem to rule out 
these estimators unless a virtually unlimited amount of data 
is available. 
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5. Summary 

A summary of the order statistic quantile estimation 
methods discussed here appears in Table III; biases given 
are for the 0.99 quantile of the exponential distribution. 
Despite the conceptual simplicity and well-understood 
behavior of these estimators, we have shown then all to lack 
some desirable features. If we wish to estimate a set of 
quantiles based on a fairly large amount of data (say 
100,000 observations) order statistic estimators are clearly 
inadequate. 
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B. Robbins - Monro Estimators 

It should be mentioned at the outset that the basic 

Robbins- Monro (RM) process cannot be applied directly as a 

quantile estimation technique in any practical method since 

its properties depend so heavily on the unknown parameter 0 

= f (s ) , i.e. the value of the derivative of the unknown 
a 

distribution function at the unknown quantile. The 
properties also depend to a lesser degree on the starting 

value s but the situation is not nearly so critical there. 

1 

Both modifications to the basic RM process considered here 

overcome this difficulty by simultaneously obtaining an 

estimate of s and 0 ; we thus investigate the RM process 

a 

applied to a known distribution using the optimum step size 

A = 0 in order to obtain results which should be better 
than those for- methods which employ estimates of 0 . 



1. Selecting the starting point 

The first problem to be faced when dealing with RM 
quantile estimation is the selection of the initial guess, 

s . The results of Hodges and Lehmann [15] indicate that 

the bias of the RM estimator is closely related to that of 

s so that starting with a value which is close to s is 

1 a 

desirable. We must have E[ s* ] < 00 in order to preserve 
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mean square convergence. One approach is to take a pilot 

sample with perhaps 1000 or 2000 observations and begin RM 

with an order statistic estimator; a second approach is to 
use a nested approximate order statistic estimator, as 
discussed in the previous section. 

This latter approach is in fact adopted here; since we 
will for the most part be employing the maximum transform in 
this work, we begin all the stochastic approximation 
estimation procedures by choosing 



X* 


= max 


c 


X , X , . . . 


1 




1 2 
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X , X 


2 




v+1 v+2 
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= max 
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3 




2v+ 1 2v + 2 
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as an empirical model for data and the 
commonly used in statistical inference 
typical of the contemplated application 
approximation estimators. 
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8. 169399E-01 


Maximum 


4. 213145E 00 



Skewness 

Kurtosis 



7. 798659E-01 
1.001406E 00 



Figure 1. Bias of the initial estimate s for the 0.99 

1 

quantile of the unit exponential distribution; v for maximum 



transformation is 56. True quantile value is 4.6052. 

Histogram sample size is 5000 observations on s*. 

n 
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The bias of the initial estimate s for the exponential 

1 

0.99 case is indicated by the histogram of Figure 1. The 
histograms for stochastic approximation quantile estimators 
in this Chapter display the bias of the estimators, i.e. 

s* = s - s , 
n n a 

rather than the estimator values themselves. In this 

Chapter, we use the term bias to refer to the entire 

distribution of s* rather than to E[ s* ] as is usual. Data 

n n 

for Figure 1 , as well as for the other histograms, was 

obtained by sampling pseudo-random numbers from various 
distributions; these were generated by the Naval 
Postgraduate School random number package LLRANDDM [21] and 
its extensions [31]. Note that the information of Figure 1 
could have been obtained analytically, but the details would 
be messy. 

The caption for each histogram in this Chapter 

indicates two sample sizes; one (the "X sample") for the 
total number of X observations from the underlying 

population used to compute the statistic (for example, s*) 

n 

whose distribution is displayed and the other (the 

"histogram sample") for the number of replications of this 
statistic used to compute the histogram and the sample 
summary statistics printed. Note that the X sample size 
will be larger than the indicated number of stochastic 
approximation steps taken because the X sample includes the 
3 v values used for the starting point. Also, the number of 
steps taken in the stochastic approximation will be smaller 
than the corresponding sample size by a factor of v when the 
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maximum transform is used (or w for the next- to- maximu m 
transform) . 



The letters "Q' 1 printed below the histogram and above 
the scale indicate the location of the sample guartiles 
(including the median as the second quartile) while the 
letter "M" indicates the sample mean. The M may be printed 
instead of one of the Q's if they appear in the same column; 
this phenomenon occurs in Figure 1. 



The basic RM process 



We begin our investigation of the distribution of s* in 

n 

the RM process by considering ’ the untransformed RM 

estimator, i.e. one which takes a step with every sample 

value. We use the optimum step size A = f (s ) = 0.01 

0.99 

for the exponential distribution. The results are shown in 

Figures 2 and 3 for s * and s* ; the distributions are 

1121 5601 

clearly grossly non-normal, despite the asymptotic normality 

indicated in Chapter I. Note that the appearance of Figure 
3 does not suggest much of an improvement despite an 
additional 4480 X observations; the skewness and kurtosis of 
the estimator are, if anything, increasing with sample size. 

An explanation of this behavior becomes clear if we 

consider the effect of the first observation, X . Because 

1 

of the negative bias in s^ (see Figure 1) , the probability 

that X > s is sliqhtly greater than 0.01; this means that 

11 

about 1.5 % of the time the second quantile estimate is 
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s = s + 0. 99 / (0,01 X 1) 

2 1 ' 

= s + 99,0 . 

1 

This is obviously much larger than the true quantile value 
of 4,60 so we expect that all of the observations on X will 

be less than s with high probability until the estimate has 
n 
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Figure 2. Bias of the RM stochastic approximation estimator 

s* for the 0.99 quantile of the exponential distribution. 

1121 

Maximum transform was not used. X sample = 1288 

observations; histogram sample = 2500 replications of s* . 

112 1 
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reached a reasonable level, perhaps 6.0. This in turn 
requires that the RM process take downward steps for about 
90 units. These downward steps are proportional to 1 - a 

according to (1. 10) and in this case are exactly equal to 
1/n. The value of n such that 

A i - ■ 90 
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Figure 3. Bias of the RM stochastic approximation estimator 

for the 0.99 quantile of the exponential distribution for an 

X sample of 5768 observations; maximum transform was not 

used. True value is 4.6052. Histogram based on 2500 

observations on s* 

5601 



74 



No n-param etr ic Quantile Estimation Through 
Stochastic Approximation 

is about 2 X 10 39 so that the RM process will in this case 
have 1.5 % of its distribution in the extreme right-hand 
tail at a substantial distance from the true quantile value 
for an^ reasonable sample size. 

An additional 4 % of the quantile estimates will also 
move upwards a distance of 49.5 units after having taken the 
first step down, while 5 % and 8 %, respectively, will take 
the third and fourth steps upwards. Thus, nearly one-fifth 
of the time the RM process will be over 20 units from its 
starting point (and from the vicinity of the true value) 
after only four observations. This then accounts for the 
appearance of Figures 2 and 3; a similar situation exists 
with random samples from a wide variety of parent 
populations, i.e. it is not particular to the exponential 
distribution . 



3. The gain sequence shift 

What is needed is a way to decrease the size of the 
first few upward steps without changing the asymptotic 
behavior of the RM process. This can be done by using the 
gain sequence 

(2) a = 1 

n 7nWK 

instead of the 1/n sequence of (1.11), where k is some 
positive constant, referred to hereafter as the shift 
constant. The proofs of Dvoretzky [7] and Sacks [33] allow 
for gain sequences of the form (2) and so we preserve the 

almost sure convergence and asymptotic normality of s . 

n 

For the exponential 0.99 quantile case a k value of 98 
would reduce the initial upwards step to a reasonable size 
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of 1 unit; from this point we need to move down a distance 

of only about 0.9 (on the average) to reach the true value 

of s . The n value such that 
0.99 




is 146 so that there will be no difficulty in reaching the 
close proximity of the true value given a reasonable sample 
size. 



Since the initial estimate s is actually based on a 

1 

sample of 168 X observations, we adopt a shift constant k of 

167; the resulting distribution of s* is shown in Figure 

1121 

4. The data from the X population for this Figure are the 

same as in Figure 2 with which Figure 4 should be compared. 
Clearly the introduction of the shift constant has greatly 
improved the finite sample properties of the estimator. 



Under more general conditions we wish to determine a 
gain seguence shift k such that the effects of a bad initial 
step can be reversed in a reasonable number of additional 
steps. Assuming that a > 0.5, the "bad" direction is upward 
and the initial step is a / fi (k+1) , using the optimum step 
divisor A = yS. Writing j for k+ 1 we must then find an n 
large enough that 



or 




1 - a > a , 



j + n 

.2 i_l 

i=g •» 1 



> a 

TT=sn 
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Table IV shows values of n for various values of a and j. 
It is clear that using a shift constant of 100 to 200 may be 
useful for 0.01 < a < 0.99. 

Another interpretation of Table IV is also possible: 
the entries show the minimum number of additional 
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Figure 4. Bias of the RM stochastic approximation estimator 

s* for the 0.99 quantile of the exponential distribution 

1121 

using a shifted gain sequence with k = 167. Maximum 

transform was not used. X sample was 1288 observations; 
histogram sample size = 2500. 
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observations needed to overcome an incorrect step upwards at 



stage j 


of 


the RM 


process 


. Note that 


as j 


increases this 


number 


of 


steps 


approaches a limit 


which i 


s approximately 


a / (1 


- a) 


. This 


means 


that the RM 


process 


tends to remain 


in the 


vicinity of the true value s 

a 


once it 


has reached it 


since here 


it will 


take a 


steps down 


on the average for each 


1 - a s 


teps 


upward 


• 
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2X1 0 6 
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200 
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It) 


129 


29310 


345 


300 
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10 


118 


8095 


517 


500 




4 


10 


110 


3191 


861 


1000 




4 


10 


105 


1717 


1720 



Table IV. Number of additional observations required for a 
shifted stochastic approximation method to reverse an 
initial unfavorable step. The shift constant is one less 
than the entry in the first column. The entries may also be 
interpreted as the number of observations needed to reverse 
an incorrect step upward at step j. The last column gives 

j + n 

the value of n satisfying 2 i _l > 1. 

i=g + 1 
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We thus see that estimating the stochastic 

approximation starting point s^ by an order statistic method 

from an initial sample whose si?.e is roughly proportional to 
a / (1 - a) and then beginning the RM process with a shift 
constant k = a / (1 - a) will avoid most of the serious 
instabilities of Figures 2 and 3. An interesting feature of 
this result is that it is distribution-free in the sense 
that the optimum step size multiplier 1 / 0 does not appear 
in an explicit way. However, whether shifting the gain 
sequence will result in an effective estimation procedure 

depends on the bias of s^ as well as the properties of the 
random variable X whose quantile we are estimating. 



For example, if the random variable X is widely 
dispersed it is quite possible that the RM process will take 
two or even mor-e steps in the wrong direction. Since the 
harmonic series on which Table IV is based grows 

logarithmically the effect of several such incorrect steps 
may require many times the sample sizes indicated to 
overcome. The typical shape of the distribution of 

stochastic approximation quantile estimators is that of 
Figure 4; the long tail to the right is made up of 
estimation sequences which are in the process of correcting 
multi-step errors. 



If there is an appreciable bias in 6^ then a large 

shift constant may seriously impede the convergence of the 

estimator to the near proximity of s . The biases indicated 

a 
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in Tables II and III in some cases are large enough to cause 
difficulties here and the order statistic estimators used to 

obtain the initial estimate s^ estimators are subject to 
considerable sampling variation. If the initial sample size 



for finding s^ is n^, then on asymptotic grounds from. (1.7) 
the initial variance is 



z a)_ 

1 E b? 

1 

which might be inflated somewhat if a nested scheme is 

used. Since n - a / (1 - a) , the initial standard 

1 

deviation will be 



<7=1 -_a , 

1 ™ 

which is n^ times the size of the first downward step. Thus 

if the initial estimate s is just one standard deviation 

1 

high we need a sample of at least n observations to overcome 
this, where 



or 



(3) 



n + j 

7. 1 - a > 1 - a 

i = j ~ /9i J~ 



n + j 

? i- 1 > 1. 






The last column of Table IV gives values of n satisfying 
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( 3 ) . 



In a given case it is thus possible that both the bias 
and the sampling variation of s^ will combine to produce a 

starting point which is far from s . If this is so an 

a 

unreasonably large sample may be needed to obtain a nearly 

Gaussian distribution for § when a is close to 0 or 1. The 

n 

long tail of Figure 4 is at least partially due to this 
phenomenon, especially in view of the skewed distribution 
of Figure 1 . 

4. Maximum and next-to-maximum transforms 

The only way to overcame this problem is to transform 

the a values being used to values closer to 0,5; this of 

course can be done by means of the maximum or 

next-to-maximum transform methods of Chapter I. In the 

context of our present discussion, it is clear that these 

transform techniques work because the effect of steps in the 

wrong direction can be readily reversed. Examples of the 

maximum transform (s*') and next-to- maximum transform (s* M ) 

20 6 

used for estimating the 0.99 quantile of the exponential 

distribution are shown in Figures 5 and 6. The theoretical 

(asymptotic) variances of s' for these Figures are .1242 and 

n 

.1848, respectively, which compare well with the observed 
values of .1431 and .1842. The distributions in both cases 
are normal or nearly so. 

Examination of Figures 4, 5 and 6 (together with a 
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great deal of data from other distributions and guantiles) 
leads to the general conclusion that the distributional 

properties of the stochastic approximation estimator s are 

n 

greatly improved by these transformation schemes. The 
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Figure 5. Bias of the RM stochastic approximation estimator 
for the 0.99 guantile of the exponential distribution using 
transform with v = 5G. X sample is 1232 



the maximum 
observations 



histogram based on 2500 replications of s*'. 

20 



82 



Non-parametric Dgiantile Estimation 
Stochastic Approximation 



Through 



next-to-maximum method seems to result in a more nearly 
Gaussian shape (as measured by the sample coefficients of 
skewness and kurtosis) for the distribution and agrees more 
closely with the asymptotic variance, but both transform 
methods give quite satisfactory results even in relatively 
small samples. 
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A further advantage of the transform methods is that 
they involve less computational effort than does the 
unt ransf ormed (direct) technique. In fact the computation 
time for the untransformed case is over four times that for 
either transform method. Thus if the X sample is being 
generated by a pseudo-random process within the computer it 
may be more efficient computationally to use one of the 
transform methods despite the variance inflation which 
requires us to generate a larger X sample for the same 
estimate precision; the time saved in the estimation 
procedure may be sufficient to offset the generation time 
for the larger sample. 



5. Direct application of the RM method 

In the previous Subsection we used a fixed step size 
A = P, chosen so as to give the best asymptotic variance. 
As indicated earlier, the RM process cannot be applied 
optimally (i.e., with minimum asymptotic variance) in any 
real situation simply because we do not know the actual 
value of P . If a reasonable initial estimate of $ can be 
found, however, it may be possible to use the RM process 
directly for quantile estimation. 

This was done in the work of Goodman, Lewis and Robbins 
[14] and also by Yuguchi [38]. They used the same starting 
value as in the present work, but with a random A value 
given by 



X = 



X ' - X' 

< 3 > ( 1 ) 



8 (X' - X* ) (X' - X' ) 

(2) (l ) (3) (2) 



This I is used for all steps in the stochastic approximation 
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estimation process as opposed to the Venter method and the 

new method which use a dynamically changing A value. A 

second instance in which direct application of the RM 

process was attempted is given by Iglehart [16]; in this 

case a fixed estimate of f{s ) based on the empirical 

a 

distribution function was used. 

Now the convergence of s’ to a limiting normal 

n 

distribution with variance 0(n- 1 ) requires that we have A < 
(Sacks [33]) . This will not in general always be the 
case for or for any other estimate of £ . It is known 

that the convergence may be much worse for A > 2 $ ; for 

example, when A = 2 0 the variance is 0 (log n/n) (Major and 
Revesz [26]). Thus, the stochastic approximation process 
with a fixed gain sequence multiplier may result in very 
poor convergence properties even if the distribution does 
not blow up as in Figures 2 and 3. 

In particular, the results of Yuguchi [38] indicate the 
-1/4 

presence of an 0 (n ) component in MSE[s']; also, the 

n 

sample coefficients of skewness and kurtosis of the 3K 

estimators increase with increasing sample size rather than 
decreasing as we would expect if the distributions were in 
fact approaching normality. The RM quantile estimators were 
also found to give "erratic results" by Iglehart [16] and he 
recommended that they not be used. . 

It is possible that these results could be improved if 
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a density estimator with better properties than ft or the 

derivative of the empirical distribution function could be 

found. A possible candidate is just the kernel estimator of 
(1.32); see Rosenblatt [32] or Parzen [28]. We prefer to 
use a method which is guaranteed to have the minimum 
asymptotic variance, however, and so in Section III.C we 
turn to techniques which have this property. 



6. Summary 



The general conclusions 
nested method for selectin 



of this 
g s is 

V 1 



Section are that the 
sufficiently robust and 



that the maximum transform is a computationally and 

statistically effective technique for RM quantile estimation 
for well-behaved X populations. The next- to-maximum 
transformation and the gain sequence shift are also useful 
and may be jiecessary in some cases to increase the 
robustness of the RM process. Finally, the finite sample 
and asymptotic properties of methods using random values for 
the gain sequence divisor A will be much better if those 
values converge to the optimum value p rather than 
remaining fixed. 



C. Venter's Estimator 

With Venter's method we enter the realm of techniques 
which can be applied to real estimation problems, i.e. those 
in which 0 is unknown. General experience with the Venter 
estimator, however, shows that it is not very robust and 
often tends to bio:/ up. 
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1. Choice of parameters 



The first question to be addressed in a practical 
implementation of this stochastic approximation method is 

the choice of the finite difference sequence {c } , which 

n 

from (1.22) is given by 



-r 

(4) c = c n , 0.25 < r < 0.50. 

n 

i 

* -r 

In order to avoid the necessity of computing n at each 

step of the estimation process (this requires a logarithm 
and an exponential to be calculated) we adopt instead the 
sequence defined recursively by 



( 5 ) 



e = 

1 ' 



1 , 



e 

n+ 1 



( 1 



e 3 

n ) e 
3 ~ n 



This sequence requires only elementary arithmetic operations 
and may be generated about 100 times faster than the 
sequence (4) . 



The properties of {e } may be readily found. First we 

n 

note that e > 0 and that e < e for all n, i.e. the 

n n+1 n 

sequence is bounded below and monotone decreasing. At stage 
n suppose that 



-1/3 -4/3 

e = n + o ( n ) ; 

n 
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then using (5) we have 



-1/3 



-4/3 



e 




+ o ( n 



+ o ( n 



-4/3 



) • 



Thus taking c = c e results in a Venter process with r = 
n n 

1/3; Venter's proof [20] allows for gain sequences of this 
form. 

Selection of the modulating constant c is the next 

problem. Intuitively it seems that c should be larger when 

the X population is more widely dispersed in the vicinity of 

s ; thus c = 1 / p would be a reasonable choice except that 
a 

p is usually unknown. We might thus decide to estimate p 

from the same initial sample as s and so use a random value 

1 

for c or else choose a reasonably robust fixed value for c. 

It turns out that the behavior of the Venter quantile 

estimator is bad regardless of the value chosen for c. The 

selection of c, however, does not seem to influence the 

estimation process as much as the bounding process (1.25) or 

(1.29). Venter's convergence proof required that the 

estimate A of 0 be restricted to the interval (a*,b*) [37] 
n 

while Fabian [9] showed that we may take 



-L 



a* = C n 

1 



( 6 ) 



b* = C In (n + 1) . 
2 
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It has been found empirically that in most applications only 
the lower bound a* is essential, though the upper bound 
improves the estimates somewhat. Following the discussion 
of the previous Section we can understand the function of 
the lower bound as limiting the size of the steps which we 
allow the Venter process to take. 

We may generate the bounds (6) by using the {e } 

n 

sequence (5) with the multiplier for a* and the sequence 

{H } defined by 
n 



(7) H = J i-» 

n i=1 

with the multiplier C for b*. It is well known that 

2 

H =lnn+y + 0 ( n~ 1 ), 
n 

where Y = 0.57722 is Euler's constant; this approach is 
about 20 times faster than computing the logarithm directly 
but still preserves the asymptotic behavior required, for 
example, in the proof cf Theorem 1 in Chapter II. 



2. Simulation results 

Considerable simulation effort was devoted to 

investigating optimum values for c, C^ and C^; in general, 

it was found that the Venter estimator is not very robust 

when random values are used and that it is difficult to 
select fixed values which give good results in a variety of 
applications. Figure 7 shows a typical example of the 
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Venter estimator with c = C =1 and C =2 applied to the 

1 2 

0.99 guantile of the exponential distribution. It was found 



that increasing the value of C decreased the spread of the 



estimator somewhat while altering the value of C seems to 



have little effect on the distribution of s* . 

n 
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Increasing does improve the distributional 

properties of the Venter quantile estimator but only at the 
cost of introducing considerable bias into the estimation 
process. In fact, the Venter estimator seems to be 
particularly bias-prone. In pseudo-random sampling 
experiments in which several quantiles from normal, uniform, 
exponential and gamma populations were estimated it was 
found that the Venter estimators had biases which were from 
50 to 1000 times as high as those of the EM estimators. 

A further drawback to this method may be seen in Figure 

8 which displays the density estimate A 1 obtained in the 

n 

same sampling experiment as the quantile estimates of Figure 

7 (the notation A' indicates that the estimate is based on a 

n 

maximum transform scheme) . The negative estimate values for 

0 = f (s ) are quite common for the Venter procedure, but 

a 

they prevent us from obtaining any reasonable estimate of 

the variance of s’. We denote Var[s' ] by az and based on 

n n n 

the asymptotic theory we estimate this variance by 



a v v 

(8) = va M-a L 

n n A” 

n 



where v is the size of the maximum transform s 



Normally, the larqer A' is the less variable is 5* but 

n n 



A* < 0 we can say very little about a 2 , 
n n 



ample. 

when 



91 



Non-pararaetric Quantile Estimation Through 
Stochastic Approximation 



Note that the appearance of Figure 8 is quite Gaussian 

and that the mean of A' is, very close to the theoretical 

n 

value for the exponential 0.99 quantile 



v— 1 

& • = va f (s ) 
a 

= 0.3222 . 
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Figure 8. Venter estimator A* of f (s ) for the 
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observations; histogram based on 2500 observations of A' 
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The distribution of A* thus agrees with the asymptotic 

n 

results of Venter [37] but the negative values are 

unacceptable for the determination of confidence intervals 

or for assessing the variability of s’ . 

n 



D. The New Estimator 



1. Choice of parameters 



To use the new estimator of Section I.E and Chapter II 

we must first decide on a number of parameters, just as in 

the Venter case. These decisions include the choice of a 

kernel function W(«) and a bandwidth sequence [b { as well 

n 

as the specification of the bounding method (2.4), analogous 
to the interval (a*,b*) for the Venter process. 



Considerable experience with density estimators, both 
in this thesis and in [23], indicates that the triangular 
weight function 

1 - | x| if | x | < 1 
(9) W (x) = . 

t 0 otherwise 



gives results comparable to those of smoother kernels with 
some saving in computational efficiency. Other kernels 
investigated include the uniform 

1 if -1/2 < x < 1/2 

W (x) = 

u 0 otherwise 



which is somewhat unstable and subject to bias, as well as 
the smoother quadratic weight function 
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1.5 ( 1 - x2 ) if I x | < 1 

W (x) = 

2 0 otherwise 



and the exponential weight function 



-lx 

W (x) = 1 e 

e 2 



All of these functions clearly satisfy assumptions (HI) to 
(W4) of Chapter II and so are admissible for stochastic 
approximation quantile estimation. 



For the bandwidth sequence we again adopt the [e } 

n 

sequence used for the Venter case. Selection of b = b e 

n n 

satisfies (2.7) with g = 1/3; once again, the savings in 



computation time make the use of the {e } sequence very 

n 

attractive. As an alternative we might use the sequence 



{e 1 } based on the recursion 
n 



e* = 1 
1 

e' 2 

e* = ( 1 - n_ ) e' 
n+ 1 2 n 



- 1/2 

which may be shown to be 0(n ) . Since excellent results 

were obtained with [e } this other sequence has not been 

n 

investigated . 



Selection of the bandwidth multiplier b must take into 
account the spread of the random variables. If too small a 
value is used it is unlikely that any X observations will 

fall close enough to the s values to make a contribution to 

n 
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the density estimate. (Recall that 

r s — X i 

(10) v = 1 W n n 

n IP t t 5 J 

n n 

will be positive only if |s -X | < b .) On the other hand, 

n n n 

if b is too large it is possible that B will be unable to 

n 

increase fast enough in a small sample to reach very large 
values of /3 . 

Practical experience with the method shows it to be 
quite robust with respect to the choice of b; most of the 
work reported in this Chapter and in Chapter IV was done 
with a fixed b value of 1 . In data where the observations 
are more widely dispersed than those considered here, it may 
be desirable to use a random value for b. If the nested 

method is used for finding s a convenient b value to use is 

1 

b = X» - X' 

(3) (1) 

using this value guarantees that further X' observations 

will be within a single bandwidth b of §' . 

n n 

The lower bound on th^ sequential estimate A of fi in 

n 

the Venter process was absolutely essential since the Venter 

technique sometimes results in negative A values. If these 

n 

values were used, steps in the wrong direction would be 

taken and so a lower bound on the value of A must be 

n 

established. For the new process all of the increments to 
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the density estimator B are positive, so that once a 

n 

positive estimate is obtained we need not be concerned with 

this type of behavior. We may assure that the s estimator 

n 

will be fairly stable by setting the initial value of B , 

n 

which we call B , to a positive value: either some random a 

0 

priori estimate of /3 or else a fixed number. The larger 

the bandwidth sequence multiplier b is, the smaller the 

value for B we want to use. We thus set B = 1/b whether b 
0 0 

is fixed or random. 



As mentioned above, we adopt here the fixed values 

b = B =1. i.e. we use the estimate B criven by 
0 n 



b = i r i 

n n L 

where w is given by (10). 

j 

a lower bound with C =1 

1 

satisfy the requirements of 

investigated so far do not 
method. 




Note that this is equivalent to 
and L = 1; although this does not 
(2.4) the results in all cases 
seem to call for a more stringent 



For an upper bound we again adopt the {H } sequence 

n 

used in the Venter case, using a C value of 1. Although 

2 

the upper bound makes very little difference in most cases 

it seems prudent to use it to avoid any possible instability 
in the early phases of the estimation process. 
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2. The basic stochastic approximation algorithm 

A succinct description of the estimation process may 
now be given by setting forth its three phases as 
follows. (Note that the same basic method holds for both 
untransformed and maximum transformed estimators.) For 
notational simplicity, we write "m" for B in the algorithm. 



1. Initia lize. Obtain the initial estimate s^ and the 

bandwidth multiplier m and initialize: 

s = s ; f = 1 / m ; 

1 

n = 1 ; b = m; h = 1 . 

2. *lE<l a t§.s For each new X observation update the 

estimates as follows: 

a. Dens ity Set t=|s-X|. If t < b increase 

f = f + (b-t) . 

b. Quantile If X < s set y = a- 1 otherwise set y = a. 

If f > h«n set d = h*n otherwise set d = f (this is the 

upper bound operation) . Finally adjust s according to 

s = s + y / d. 

c. Constants Update the constants for the next phase: 

h=h+1/n; n = n + 1 ; 

b = ( 1 - b 3 / 3m 3 ) b. 

3. T he final estimate of the a-guantile is s. 
An estimate of Var[s] is given by 

Var[s] = (n-1) a (1-a) / f 2 . 
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while f/(n-1) 



is an estimate of f (s ). 

a 



The 
var iable 
values 



process 
values (s, 
(a and m) 



thus requires us to store just five 
f, n, b and h) and a pair of fixed 
. After the kth X value has been used in 



step 2, s has the value s , f is k B , 

k+1 k 



n is k+1, b is e 



k+ 1 
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and h is H 

k + 1 



To carry out the maximum (minimum) transform with 

v v 

sections of size v, we use the value a* = a (a' = (1-a) ) 

in steps 2. b and 3 and carry out step 2 only for each of the 

section maxima (minima) . The estimate of f (s ) in step 3 is 

a 

v- 1 v- 1 

then f/[ va (n— 1 ) ] {or f/[v(1-a) (n— 1 ) ] for the minimum 

case] . Here we will require one more constant (v) to be 
stored as well as two more variables which keep track of the 
number of observations considered so far in the current 
section and the value of the maximum (minimum) value 
encountered . 



3. Simulation results 

An example of the new stochastic approximation quantile 
estimator applied to the 0-99 quantile of the exponential 
distribution appears in Figure 9. The asymptotic variance 
for this maximum transformed case is 



(11) Var[ s ’ ] 
n 



v-T 2 

n {va f (s ) } 



a 



= 2.3615 
n” 

or 0.02362 for n = 100. This corresponds quite closely to 
the observed value of 0.02435 and the shape of the histogram 
also appears reasonably Saussian. We thus conclude that the 
asymptotic theory is a generally acceptable description of 
the behavior of the new stochastic approximation quantile 
estimation scheme for moderately large samples. 
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Comparing this distribution to that of the 
corresponding Venter estimate {Figure 7) we see that the new 
method results in an estimator whose properties are much 
more reasonable; the observed mean bias is less for the new 
estimator while the variance is smaller by a factor of 7. 
The distribution also appears much more Gaussian and the 
sample coefficients of skewness and kurtosis are smaller in 
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* 

Figure 9. We conclude that the new procedure is decidedly 
better than the Venter technique for quantile estimation. 

Figure 10 shows the distribution of the density 

estimate B' (or f/(n-1) from the algorithm) which was 
n 

obtained at the same time as the data of Figure 9. Once 
again the distribution appears approximately Gaussian and 
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Figure 11. Bias of the stochastic approximation quantile 

estimator for the 0.99 quantile of the exponential 

distribution based on an X sample of 5768 observations. 

Maximum transformation was not used in this case. Histogram 

sample size = 2500 observations on s* 
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the observed mean of 0.3264 
theoretical value of 0.3222. 
Theorem 4 the variance should be 



is quite close to 
On asymptotic grounds 
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which is very close to'the observed value of 7.612 X 10~ 3 . 

Also there are no negative values of B 1 so that all of them 

n 

are admissible as variance estimators. 



The new estimator was also applied to the 0.99 quantile 
of the exponential distribution without using the maximum 
transform; the results appear in Figure 11. Clearly the new 
process is far more stable than either the RM or Venter 

methods; the distribution of s is very nearly normal 

5600 

with an observed variance (0.02328) close to the asymptotic 



value (0.01768) . The density estimate B for this case 

5600 

is shown in Figure 12; the mean is close to the true value 

of 0.01 while the observed variance of 2.07X 10 -5 is also 
close to the asymptotic value of 1.59 X 10~ 5 although the 
distribution is skewed to the right and does not appear 
Gaussian. 



Despite the results of Figures 11 and 12 we still 
prefer to use the maximum transformed version of the new 
process both because it is computationally faster and 
because its finite sample properties are generally superior, 
especially for quantiles more extreme than the 0.99. It is 
nevertheless encouraging to find the new process 
sufficiently stable to avoid the very heavy tails displayed 
by the untransformed RM estimator (see Figures 2 and 3) . 
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4. The stability of the new estimator 

An explanation of the stability displayed in Figure 11 
follows if we consider the role of the variable f in the 
algorithmic description of the new method given above. 

Recall that f = n B , i.e. it is the divisor in the basic 

n 



stochastic approximation recurrence relation. Now f will 

increase at each step when we use the triangular kernel 

function only as long as the latest X observation is close 

n 

to s . If f does not increase, however, the size of the 
n 

steps taken by the process will remain the same; we thus 

have an analog to the accelerated process of Kesten [18] 
where the step size remains constant until we have straddled 
the true value by taking steps in both directions. 



The new method is an improvement on Kesten's technique 
because the step size adjustment here is made for each X 



observation. Instead of determining that the estimator s 



is in the vicinity of s by looking at the changes in step 



a 

direction we examine directly the 

and s . For example, if s is a 
n 1 

none of the X observations are near 
then the process will take steps 



relationship between X 

n 



long ways from 


s so that 


a 


s for small 


j values 


j 


of size 1/P = 


1 until it 


0 
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reaches a point where s is close to the latest observation 

n 



value X . Once the s values are close to the X 
n n n 



observations the w terms added to f will be positive and so 

n 

the step size will decrease. 



5. Confidence intervals 

The final area to be investigated here is that of 

applying the new estimation procedure to the determination 

of confidence intervals on s . To obtain a 100 p % 

a 

confidence interval on s we use 

a 

(12) s ± /12IEII B~ 1 u 

n + 1 </ n n p 

where u is the upper 1 - p/2 point of a standard normal 
P 

random variable. It would be possible to establish the 

asymptotic properties of confidence intervals estimated in 
this way following the work of Sielken [34]; this has not 
been done here. 

To investigate the finite sample properties of the 
confidence intervals in the exponential 0.99 case, however, 
further simulation experiments were undertaken. Based on 
10,000 replications the coverage of the confidence interval 
(12) for various p values was as follows: 
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2 
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The data of Figure 13 show the distribution of the upper 
95 % confidence limit (with the mean of 4.605165 subtracted) 
for a sample of 5768 X observations. On asymptotic grounds, 
the expected value for this limit should be 0.25271 which 
corresponds very well to the observed mean of 0.25623. 
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Figure 13. Value of the upper 95 % confidence limit for the 
0.99 quantile of the unit exponential distribution; the 
true value of 4.605165 has been subtracted from each 
observation. Estimated by stochastic approximation from X 
samples of 5768. Histogram sample size = 2500. 
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6. Summary 

The new estimator has been used to estimate all the 
quantiles in Table I for random variables from the uniform, 
normal, exponential, gamma and Cauchy distributions. So 
much data was obtained that it would be impractical to 
attempt to display it all here; the results were, with few 
exceptions, in general agreement with those shown here for 
the exponential 0.99 case. Serious irregularities were 
noted in the Cauchy case; these were due to the infinite 

variance of the initial estimate s^. When the Cauchy 

experiment was repeated with the fixed starting value s^ = 

0, however, reasonable agreement with the asymptotic theory 
was obtained. 

The other major limitation found was in using the 
maximum transform for the estimation of extreme quantiles 
from distributions whose densities do not approach zero in 
one or both tails; examples include the uniform distribution 
and the left-hand tail of the exponential distribution. In 
these cases the transformed density ft 1 is very large 
31.90 for the 0.01 quantile of the exponential distribution, 
for example - and it requires very large X samples for the 

value of B' to increase sufficiently to obtain good 
n 

estimates of ft ' . The resulting s values have 

n 

distributions which agree with the asymptotic theory, but 

the too-small density estimates result in confidence 
intervals which are much too wide. In other words, in this 
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situation the point estimator of s is satisfactory but the 

a 

density estimate (and hence the confidence interval) is 
relatively poor. 

We conclude this Chapter with the observation (based on 
the above digital simulation experience) that the new method 
overcomes most of the limitations of stochastic 

approximation techniques for quantile estimation. The 
asymptotic theory appears to be an adequate description of 
the behavior of the estimators in samples large enough to 
give reasonable variances and we are confident enough of the 

distribution of s that we may use the estimate of f(s ) for 

n a 

the construction of confidence intervals. 
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In the previous Chapter we examined the problem of 
finite sample performance of stochastic approximation 
quantile estimates by investigating the dist rib ution of the 

difference s* = s - s , which we refer to hereafter as the 
n n a 

bias of the estimator. Considering the distribution of s* 

n 

was done because simply looking at its expected value is not 

sufficient if one is to explain the extremely poor 

performance of some stochastic approximation quantile 

estimators. As illustrated by Figures 2 and 3, this poor 

performance is characterized by very heavy tails and 

exceptionally wide dispersion of s*. By using the maximum 

n 

transform, however, and the new technique of Section III.D, 

we were able to overcome these drawbacks and obtain 

estimates s whose distribution is approximately Gaussian, 
n 

Bias is usually taken to be E[s*] and once the problem 

n 

of extremely large deviations has been overcome it is 

necessary to look at bias in this average sense. This is 
because one facet of the poor performance of stochastic 

approximation quantile estimators is that convergence of 

E[ s ] to the true value s is very slow as measured 
n a 

empirically even though the estimates are asymptotically 
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